Skip to content

add fork limit and md5sum & filesize check#65

Merged
edsu7 merged 4 commits intomainfrom
calculate_md5_and_filesize
Mar 27, 2026
Merged

add fork limit and md5sum & filesize check#65
edsu7 merged 4 commits intomainfrom
calculate_md5_and_filesize

Conversation

@edsu7
Copy link
Copy Markdown
Collaborator

@edsu7 edsu7 commented Mar 17, 2026

  • remove mandatory MD5sum and filesize + add missing field generation if missing
  • add fork limit on SONG/SCORE processes to reduce auth calls

- remove mandatory MD5sum and filesize + add missing field generation if missing
- add fork limit on SONG/SCORE processes to reduce auth calls
@edsu7 edsu7 requested a review from lindaxiang March 17, 2026 20:40
@edsu7
Copy link
Copy Markdown
Collaborator Author

edsu7 commented Mar 17, 2026

To test:

See analysis_004 in tests/test_data/analysis_meta/file_metadata.tsv for missing md5 and filesize

nextflow run . \
    --study_id "PCGLST0001" \
    --path_to_files_directory "tests/test_data/genomics" \
    --file_metadata "tests/test_data/analysis_meta/file_metadata.tsv" \
    --workflow_metadata "tests/test_data/analysis_meta/workflow_metadata.tsv" \
    --analysis_metadata "tests/test_data/analysis_meta/analysis_metadata.tsv" \
    --outdir test_minimal \
    -profile test,docker,sd4h_dev \
    --token TOKEN

@lindaxiang
Copy link
Copy Markdown
Collaborator

@edsu7 , do you mind to separate the current PR into two PRs. One for adding fork limit and another for md5 & filesize check? So that we can quickly merge the one for adding fork limit to unblock the current submissions and tests?

For md5 & filesize, I found an issue regarding payload/validate, and may need more time to discuss with you for the implementation.

- remove cross_check and move md5 verification to payload generation
- add filesize calculation to payload generation
@edsu7
Copy link
Copy Markdown
Collaborator Author

edsu7 commented Mar 26, 2026

To test:

nextflow run . \
    --study_id "PCGLST0003" \
    --path_to_files_directory "tests/test_data/genomics" \
    --file_metadata "tests/test_data/analysis_meta/md5_fileSize_file_metadata.tsv" \
    --workflow_metadata "tests/test_data/analysis_meta/md5_fileSize_workflow_metadata.tsv" \
    --analysis_metadata "tests/test_data/analysis_meta/md5_fileSize_analysis_metadata.tsv" \
    --outdir test_minimal \
    -profile test,docker,sd4h_dev \
    --token TOKEN

Test case analysis_004 will generated to account for absent fileSize and md5sum
Test case analysis_006 fails due to incorrect md5sum
Test case analysis_001 fails due to incorrect filesize

"fileName": file_row.get("fileName", None),
"fileSize": int(file_row.get("fileSize")) if file_row.get("fileSize") and file_row.get("fileSize").isdigit() else None,
"fileName": file_name,
"fileSize": int(file_row.get("fileSize")) if file_row.get("fileSize") and file_row.get("fileSize").isdigit() else calculate_filesize(file_path),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add similar function of verify_filesize().

PCGLST0003 analysis_003 variantCall PART_003 SPEC_003 SAMPLE_003 EXP_003 Genomics Germline Single sample GRCh38 GENCODE v38
PCGLST0003 analysis_004 sequenceAlignment PART_004 SPEC_004 SAMPLE_004 EXP_004 Genomics GRCh38 GENCODE v38
PCGLST0003 analysis_005 sequenceExperiment PART_004 SPEC_004 SAMPLE_004 EXP_005 Genomics GRCh38 GENCODE v38
PCGLST0003 analysis_006 sequenceAlignment PART_004 SPEC_004 SAMPLE_004 EXP_004 Genomics GRCh38 GENCODE v38
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create new metadata test files for analysis, file and workflow?

@edsu7 edsu7 requested a review from lindaxiang March 26, 2026 19:12
Copy link
Copy Markdown
Collaborator

@lindaxiang lindaxiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it and worked as expected.

@edsu7 edsu7 merged commit 6133c4f into main Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants